QTM 385 - Experimental Methods
Lecture 08 - Blocking, Clustering (cont.), and Statistical Power
Danilo Freire
Emory University
Hi, there!
Nice to see you again! 😉
Group 1
| Elizabeth Shin |
EJSHIN6 |
2520422 |
elizabeth.shin@emory.edu |
| Emily Choi |
ECHOI73 |
2492522 |
emily.choi@emory.edu |
| Esther Yang |
QYANG68 |
2487073 |
esther.yang2@emory.edu |
| Zhiyi Li (Yolanda Li) |
ZLIT23 |
2513881 |
zhiyi.li@emory.edu |
| Angela Xie |
JXIE82 |
2515217 |
angela.xie@emory.edu |
Group 2
Originally Anushka Basu & Annie Cao, now topped up to 4
| Anushka Basu |
ABASU9 |
2551669 |
anushka.basu@emory.edu |
| Annie Cao |
JCAO66 |
2599315 |
annie.cao@emory.edu |
| Courtney Fitzgerald |
CFITZG4 |
2484240 |
courtney.fitzgerald@emory.edu |
| Adam Pastor |
AMPASTO |
2565464 |
adam.pastor@emory.edu |
Group 3
| Maura Dianno |
MDIANNO |
2481848 |
maura.dianno@emory.edu |
| Kush Bhatia |
KBHATI7 |
2492303 |
kush.bhatia@emory.edu |
| Shriya Iyer |
SAIYER4 |
2493146 |
shriya.iyer@emory.edu |
Group 4
| Sylvia Xing |
JXING8 |
2549831 |
sylvia.xing@emory.edu |
| Lucy Liu |
CLIU452 |
2561533 |
lucy.liu@emory.edu |
| Jessie Hao |
JHAO23 |
2513298 |
jessie.hao@emory.edu |
| Zoe Liu |
SLIU547 |
2583239 |
zoe.liu@emory.edu |
Group 5
Dhwani + Anita, Harris + Xinyi
| Dhwani Venkatarangan |
DAVENKA |
2554493 |
dhwani.venkatarangan@emory.edu |
| Anita Osuri |
AOSURI2 |
2557540 |
anita.osuri@emory.edu |
| Xinyi Wang |
XWAN878 |
2549813 |
xinyi.wang@emory.edu |
| Harris Wang |
MWAN467 |
2551003 |
harris.wang@emory.edu |
Group 6
Randomly assigned
| Shuyang Yu |
SYU1025 |
2610436 |
shuyang.yu@emory.edu |
| Phoebe Pan |
ZPAN66 |
2630423 |
ziwen.pan@emory.edu |
| Zihan Liang |
ZLIAN57 |
2609381 |
zihan.liang@emory.edu |
| Evelyn Shi |
CSHI59 |
2609525 |
evelyn.shi2@emory.edu |
Group 7
Miracle + Ahshar, now topped up to 4
| Davis Boor |
DBOOR |
2556176 |
davis.boor@emory.edu |
| Xipu Wang |
XWAN884 |
2551008 |
xipu.wang@emory.edu |
| Miracle Ephraim |
MEPHRAI |
2492732 |
miracle.ephraim@emory.edu |
| Ahshar Brown |
AOBROW2 |
2575182 |
ahshar.brown@emory.edu |
Group 8
Daniel + Howie
| Howie Brown |
HJBROW5 |
2585210 |
howie.brown@emory.edu |
| Maxwell Troilo |
MTROILO |
2520874 |
max.troilo@emory.edu |
| Daniel Nickas |
DNICKAS |
2549711 |
daniel.nickas@emory.edu |
Blocking recap
- Blocking involves grouping experimental units based on certain characteristics to ensure comparability between treatment and control groups.
- Blocks are formed based on variables expected to affect the outcome, and within each block, units are randomly assigned to treatment or control.
- Blocking reduces variance and increases precision by ensuring balanced groups within each block.
- Key benefits:
- Ensures equal representation of important subgroups in treatment and control.
- Reduces the risk of confounding variables affecting results.
- Particularly useful for small sample sizes or when heterogeneity is expected.
Clustering recap
- Clustering involves assigning whole groups of units to treatment and control, often due to practical constraints.
- Common in experiments where individual randomisation is impossible (e.g., classrooms, villages).
- Clustering introduces intra-cluster correlation (ICC), which measures how similar individuals within the same cluster are.
- Challenges:
- Higher variance compared to individual randomisation.
- Requires robust clustered standard errors to avoid underestimating uncertainty.
- Blocking can be used within clusters to further reduce variance and improve precision.
Clustering (cont.)
- Last class, we discussed the concept of clustering in experiments
- Clustering is often used when random assignment is not feasible, such as in:
- Education (e.g., classrooms)
- Healthcare (e.g., hospitals)
- Community interventions (e.g., neighbourhoods)
- Clustering can be beneficial, but it also comes with challenges, such as the need for robust statistical methods to account for intra-cluster correlation
- We also discussed the importance of considering the design effect when planning a clustered experiment
- We unfortunately lose statistical power when we cluster
- Clustering is more of a necessity than a choice
- The intra-cluster correlation (ICC) is a measure of the similarity of individuals within the same cluster, compared to individuals in different clusters
- The ICC is defined as:
- \(ICC = \frac{\sigma^2_{between}}{\sigma^2_{between} + \sigma^2_{within}}\)
- The measure goes from 0 to 1. When it is closer to 0, it means that clusters have no influence on the outcome, so we can treat individuals as independent
- This is the ideal situation!
- When it is closer to 1, it means that all units within the same cluster have the same outcome
- This is not good because it implies that units are so similar that the effective sample size is equal to the number of clusters
- ICCs are always between these two, but the larger it is, the more we need to account for it
Clustering (cont.)
- As we’ve seen, cluster randomised trials entail a series of specific challenges for standard estimation and testing methods
- If randomisation is conducted at the cluster level, the uncertainty arising from this process is also at the cluster level
- When we have a sufficient number of clusters, cluster robust standard errors can help us produce confidence intervals with the correct coverage. However, these require a large number of clusters
- If the cluster size (or any related characteristic) is linked to the effect magnitude, then the estimation may be biased (and adjustments are required)
- So, what can we do? 🤷🏻♂️
What to do in such situations?
- One option is to increase the sample size to account for the loss of power due to clustering
- This can be done by:
- Adding more clusters
- Increasing the number of units within each cluster
- However, this can be challenging in practice, as it may not always be feasible to add more clusters or units
- And this is where blocking comes in!
- Blocking can be used to reduce variance within clusters, which can help to mitigate the loss of power due to clustering
- Imai et al (2009) proposed a design suggestion to improve the efficiency of cluster randomised trials
- The strategy has three steps:
- First, choose the causal quantity of interest (usually, individual difference in means)
- Then, identify available pre-treatment covariates likely to affect the outcome variable (blocks), and, if possible, pair clusters based on the similarity of these covariates and cluster sizes
- They show that this step is usually overlooked and can yield many additional observations
- Finally, researchers should randomly choose one treated and one control cluster within each pair